Amherst
Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation 1 and Daniel Sheldon College of Information and Computer Sciences, University of Massachusetts Amherst
Recent work in variational inference (VI) uses ideas from Monte Carlo estimation to tighten the lower bounds on the log-likelihood that are used as objectives. However, there is no systematic understanding of how optimizing different objectives relates to approximating the posterior distribution. Developing such a connection is important if the ideas are to be applied to inference--i.e., applications that require an approximate posterior and not just an approximation of the log-likelihood. Given a VI objective defined by a Monte Carlo estimator of the likelihood, we use a "divide and couple" procedure to identify augmented proposal and target distributions. The divergence between these is equal to the gap between the VI objective and the log-likelihood. Thus, after maximizing the VI objective, the augmented variational distribution may be used to approximate the posterior distribution.
Importance Weighting and Variational Inference Justin Domke 1 and Daniel Sheldon College of Information and Computer Sciences, University of Massachusetts Amherst
Recent work used importance sampling ideas for better variational bounds on likelihoods. We clarify the applicability of these ideas to pure probabilistic inference, by showing the resulting Importance Weighted Variational Inference (IWVI) technique is an instance of augmented variational inference, thus identifying the looseness in previous work. Experiments confirm IWVI's practicality for probabilistic inference. As a second contribution, we investigate inference with elliptical distributions, which improves accuracy in low dimensions, and convergence in high dimensions.
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
Finetuning large language models (LLMs) in federated learning (FL) settings has become increasingly important as it allows resource-constrained devices to finetune a model using private data. However, finetuning LLMs using backpropagation requires excessive memory (especially from intermediate activations) for resource-constrained devices. While Forward-mode Auto-Differentiation (AD) can significantly reduce memory footprint from activations, we observe that directly applying it to LLM finetuning results in slow convergence and poor accuracy.
Shlomo Zilberstein wins the 2025 ACM/SIGAI Autonomous Agents Research Award
This prestigious award is made for excellence in research in the area of autonomous agents. It is intended to recognize researchers in autonomous agents whose current work is an important influence on the field. Professor Shlomo Zilberstein was recognised for his work establishing the field of decentralized Markov Decision Processes (DEC-MDPs), laying the groundwork for decision-theoretic planning in multi-agent systems and multi-agent reinforcement learning (MARL). The selection committee noted that these contributions have become a cornerstone of multi-agent decision-making, influencing researchers and practitioners alike. Shlomo Zilberstein is Professor of Computer Science and former Associate Dean of Research at the University of Massachusetts Amherst. He is a Fellow of AAAI and the ACM, and has received numerous awards, including the UMass Chancellor's Medal, the IFAAMAS Influential Paper Award, and the AAAI Distinguished Service Award.
Reasoning and Sampling-Augmented MCQ Difficulty Prediction via LLMs
Feng, Wanyong, Tran, Peter, Sireci, Stephen, Lan, Andrew
The difficulty of multiple-choice questions (MCQs) is a crucial factor for educational assessments. Predicting MCQ difficulty is challenging since it requires understanding both the complexity of reaching the correct option and the plausibility of distractors, i.e., incorrect options. In this paper, we propose a novel, two-stage method to predict the difficulty of MCQs. First, to better estimate the complexity of each MCQ, we use large language models (LLMs) to augment the reasoning steps required to reach each option. We use not just the MCQ itself but also these reasoning steps as input to predict the difficulty. Second, to capture the plausibility of distractors, we sample knowledge levels from a distribution to account for variation among students responding to the MCQ. This setup, inspired by item response theory (IRT), enable us to estimate the likelihood of students selecting each (both correct and incorrect) option. We align these predictions with their ground truth values, using a Kullback-Leibler (KL) divergence-based regularization objective, and use estimated likelihoods to predict MCQ difficulty. We evaluate our method on two real-world \emph{math} MCQ and response datasets with ground truth difficulty values estimated using IRT. Experimental results show that our method outperforms all baselines, up to a 28.3\% reduction in mean squared error and a 34.6\% improvement in the coefficient of determination. We also qualitatively discuss how our novel method results in higher accuracy in predicting MCQ difficulty.
The study of short texts in digital politics: Document aggregation for topic modeling
Nakka, Nitheesha, Yalcin, Omer F., Desmarais, Bruce A., Rajtmajer, Sarah, Monroe, Burt
Statistical topic modeling is widely used in political science to study text. Researchers examine documents of varying lengths, from tweets to speeches. There is ongoing debate on how document length affects the interpretability of topic models. We investigate the effects of aggregating short documents into larger ones based on natural units that partition the corpus. In our study, we analyze one million tweets by U.S. state legislators from April 2016 to September 2020. We find that for documents aggregated at the account level, topics are more associated with individual states than when using individual tweets. This finding is replicated with Wikipedia pages aggregated by birth cities, showing how document definitions can impact topic modeling results.
AI scholars win Turing Prize for technique that made possible AlphaGo's chess triumph
Some of the flashiest achievements in artificial intelligence in the past decade have come from a technique by which the computer acts randomly from a set of choices and is rewarded or punished for each correct or wrong move. It's the technique most famously employed in AlphaZero, Google DeepMind's 2016 program that achieved mastery at the games of chess, shogi, and Go in 2018. The same approach helped the AlphaStar program achieve "grandmaster" play in the video game Starcraft II. On Wednesday, two AI scholars were rewarded for advancing so-called reinforcement learning, a very broad approach to how a computer proceeds in an unknown environment. Andrew G. Barto, professor emeritus in the Department of Information and Computer Sciences at the University of Massachusetts, Amherst, and Richard S. Sutton, professor of computer science at the University of Alberta, Canada, were jointly awarded the 2025 Turing Award by the Association for Computing Machinery.
Andrew Barto and Richard Sutton win Turing award for AI training trick
Andrew Barto and Richard Sutton have won the 2024 Turing award, which is often called the Nobel prize of computing, for their fundamental work on ideas in machine learning that later proved crucial to the success of artificial intelligence models such as Google DeepMind's AlphaGo. Barto, who is now retired and lives in Cape Cod, Massachusetts, didn't even realise he was nominated for the award. "I joined a Zoom with some people and was told and I was…
Pioneers of Reinforcement Learning Win the Turing Award
In the 1980s, Andrew Barto and Rich Sutton were considered eccentric devotees to an elegant but ultimately doomed idea--having machines learn, as humans and animals do, from experience. Decades on, with the technique they pioneered now increasingly critical to modern artificial intelligence and programs like ChatGPT, Barto and Sutton have been awarded the Turing Award, the highest honor in the field of computer science. Barto, a professor emeritus at the University of Massachusetts Amherst, and Sutton, a professor at the University of Alberta, trailblazed a technique known as reinforcement learning, which involves coaxing a computer to perform tasks through experimentation combined with either positive or negative feedback. "When this work started for me, it was extremely unfashionable," Barto recalls with a smile, speaking over Zoom from his home in Massachusetts. "It's been remarkable that [it has] achieved some influence and some attention," Barto adds.